-
Notifications
You must be signed in to change notification settings - Fork 58
Database support for delegating IP Pools for Oxide internal use #9103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f3c16d2
to
15bc99a
Compare
- Add database queries for linking any number of IP Pools to the Oxide internal Silo, rather than assuming there's just one. - Rework queries for linking / unlinking IP Pool and Silo, making them concurrency-safe and preventing deletion when there are IPs or we're trying to delete the last internal IP Pool. - The public API does not change here. We're still assuming exactly one IP Pool per IP version for internal usage, and not allowing the link or pool to appear in the API. That'll be a follow-up commit. - Fixes #8948
15bc99a
to
44a2d02
Compare
I'm actually rethinking this a bit. There's a lot of complexity here, mostly because we use the This makes the database queries pretty complicated, since we're trying to enforce that a pool is one of:
This also means you can't just look at the I think I want to restructure this. I want to add an All that to say, I'm moving this to a draft now, and I'll un-draftify it when I'm happy again. |
So would a delegated pool be disallowed from being linked? Or could a pool do both? |
That would be disallowed, just like it is today and in this PR. A pool is delegated to Oxide XOR linkable to customer silos. |
- Add an `is_delegated` column to the `ip_pool` table - Enforce that pools are delegated XOR linkable to customer silos by looking at this table and using conditional update queries.
Alright, I've marked this ready for review now. The latest commits add an |
…ip-pool-delegation
nexus/db-model/src/ip_pool.rs
Outdated
pub rcgen: i64, | ||
|
||
/// True if the IP Pool has been delegated for Oxide use. | ||
pub is_delegated: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming at this review without much context, I'm wondering if is_delegated
is too vague/generic and should say what it's delegated to, although I don't know how to be more specific without making it a lot longer. is_delegated_to_services
? is_delegated_to_oxide
? Entirely possible this is fine as-is and "look at the doc comment to see what it's delegated to", especially if we don't think we'll ever have other kinds of IP pool delegation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallagher and I talked a bit live about this. I think we'll change the name here to something like purpose
or intended_use
, and make this an enum rather than a bool. We'd have two variants today, indicating the pool can be used for external silos or for Oxide internal use. We can add variants if we need in the future for expressing things like "use by Nexus" or "use by external DNS".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated all this in e4ff3b8.
// NOTE: It'd be better to do one roundtrip to the DB, but this is | ||
// rarely-used right now. We also want to return the authz and database | ||
// objects, so we need the lookup-path mechanism. | ||
// TODO-remove: Use list_ip_pools_for_internal instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be addressed on this PR? Will callers of this method either fail (if the original built-in pools no longer exist) or succeed but with incorrect/incomplete information (if additional pools have been delegated)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK as-is, because you cannot actually change these pools yet. There's no API in Nexus for doing that. I have a follow-up PR in the works which makes the delegation API public, which also changes how this method works.
.await | ||
.unwrap(); | ||
|
||
// We should be able to delete one of these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows the delegated pools to be down to only IPv6 - do we know that works throughout the rest of the control plane? (E.g., I don't think reconfigurator cares, but AFAIK we've never tested an IPv6 only setup.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same comment as above applies -- this tests the internal database logic, but you can't exercise this through an API yet. I don't think it's possible to get into a situation in a real deployment where there is only an IPv6 pool. Even if there were a way to delete the IPv4 pool, I think that'd fail because there are addresses allocated out of it anyway.
- Add an enum for IP Pool reservation type, currently just representing internal or external uses. - Rework queries to be phrased in terms of "reserving" a pool for a specific use and checking for valid reservation types at query time. - Add expectorate tests for all complicated queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good; I like the enum over the boolean quite a lot. Mostly just have more vague questions for you. 😅
) | ||
AND CAST( | ||
IF( | ||
(SELECT count(1) FROM ip_pool WHERE time_deleted IS NULL AND reservation_type = $5 LIMIT 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud - I think in practice this extra guard is probably not necessary? The guard above that we can't change the type of a pool if there are any services using IPs from it will prevent deleting any in-use pool, and there will always be at least one in-use pool (otherwise we wouldn't have any services with external IPs at all), right? But this seems super cheap and maybe makes it very clear that we always have to have at least one, so I'm on board with keeping it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the checks are doing slightly different things. The first one ensures that we don't delete the pool if there are addresses used from it. This one checks that there are at least two pools of this same reservation type. Those could be different in a future where we make more fine-grained distinctions between the types of internal pools, like NTP, DNS, etc. It might still be the case that the address-use guard is sufficient, but I'm with you that this is cheap and conservative :)
NOT ( | ||
resource_type = 'silo' AND | ||
resource_id = '001de000-5110-4000-8000-000000000001' AND | ||
is_default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removing this constraint correct as a part of this PR? I'm not sure how reservation_type
/ multiple external pools interacts with is_default
. Will we let people set a default external service pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question -- I put this in place recently when I thought we'd reserve pools by linking them to the internal silo. Since we're not doing that anymore, it seemed more confusing than helpful to keep the constraint.
However! I realize I need a data migration to remove those links to the internal silo. I'll add that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the deletion in beb5523
AND reservation_type = $3 | ||
AND CAST( | ||
IF( | ||
EXISTS(SELECT 1 FROM ip_pool_resource WHERE ip_pool_id = $4 LIMIT 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More thinking out loud - this all looks correct to me, but I'm vaguely worried that our checks to switch between reservation types are so different. If I want to change something away from "for external silos", I have to check the ip_pool_resource
table, but if I want to change something away from "for Oxide internal", I have to check the external_ip
table. Should rows in external_ip
also have corresponding rows in ip_pool_resource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should rows in
external_ip
also have corresponding rows inip_pool_resource
?
I think this is true for addresses used by external silo resources, like instances. So to move a pool away from "for Oxide internal", we couldn't use a check against that table. Not to confuse things further, but we could do that if we went back to linking the pool to the internal silo when we reserve it "for Oxide internal".
Uh oh!
There was an error while loading. Please reload this page.